-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace btree
with btree_gin
indexes for PostgreSQL to allow larger metadata and faster nested search
#588
Conversation
Point of clarification: in the end, was the issue about the size of a given key or was it the overall size of the document? I thought it seemed to be the former but turned out to be the latter, but I could be misremembering or behind. Worth nailing that down here for the record in case we revisit later. |
tiled/catalog/adapter.py
Outdated
elif dialect_name == "postgresql" and operation == operator.eq: | ||
condition = orm.Node.metadata_.op("@>")( | ||
type_coerce( | ||
{keys[0]: reduce(lambda x, y: {y: x}, keys[1:][::-1], query.value)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever. Can you refactor this into a utility function so that it can be precisely targeted with a unit test? It might also end up being reused in the future, elsewhere.
CI shows that this test is failing: tiled/tiled/_tests/test_catalog.py Line 160 in 7c122f9
I think this test (mine) was not the right approach. The query planner's internal details are slippery, and may depend on the number of records, perhaps the available system resources, the version of Postgres or SQLite... I think it would be best to develop a separate test harness, which may or may not use pytest, to evaluate index usage. Like performance benchmarks, this is just a different category of test and does not sit well inside the unit test suite. I think I'd be in favor of just ripping this out for now, but I'm open to alternatives. |
@jmaruland Your (C2QA) use case is one motivation for this fix. At your convenience, would you check out this PR branch locally, from @Kezzsim's fork, start a temp catalog server, and test that you can insert your data? |
Technically it isn't key related, it's overall metadata size related. Here's an example error from importing a bluesky run:
I will remove the term |
btree
with btree_gin
indexes for PostgreSQL to allow larger metadata keys and faster nested searchbtree
with btree_gin
indexes for PostgreSQL to allow larger metadata and faster nested search
Co-authored-by: Dan Allan <daniel.b.allan@gmail.com>
Once the CI passing, we must remember to add a database migration that will drop the old index and replace it with the new one, on existing deployments. (I can pair-code this on Zoom, @Kezzsim; the process is not too complicated, but a little obscure.) |
The title explains it all, this PR is intended to have the following effects:
btree
index type the operation would fail with an error.eq
binary operation for queries like.search(Key("start.purpose") == "test")
to utilize the new index for rapid traversals, greatly reducing the time needed to query when searching for nested strings in metadata objects.Caveats:
ne
,lt
,le
,gt
,ge
leading to slow seq scans in those conditions. Hypothetically this can be achieved but there is no known way currently.eq
operator is technically being converted to the postgresql@>
operator which is technically anIN
, this may have unintended side effects.